Syntax-based Statistical Machine Translation
نویسندگان
چکیده
In its early development, machine translation adopted rule-based approaches, which can include the use of language syntax. The late 1980s and early 1990s saw the inception of the statistical machine translation (SMT) approach, where translation models can be learned automatically from a parallel corpus rather than created manually by humans. Initial SMT models were word-based and phrase-based, without the use of syntactic knowledge. In phrase-based SMT, a source sentence is first segmented into phrases and then translated phrase-by-phrase with some reordering of the translated phrases in the target sentence. This has posed challenges when translating between two syntactically different languages. Syntax-based SMT approaches take advantage of syntactic knowledge within the framework of SMT. This book provides an introduction to syntax-based SMT approaches. It is a valuable resource for those who are interested in syntax-based SMT. The book consists of seven chapters. There is not an introduction chapter in this book, aside from the preface, which can be considered as a brief introduction. Readers are referred to Koehn (2010) for background knowledge. I think an introduction chapter categorized into sections would have been useful, before proceeding to describe the various models. The first two chapters provide principles applicable across various syntaxbased SMT approaches. The next three chapters describe syntax-based SMT decoding in detail; this constitutes half of the book. Selected extended topics are provided in the next chapter, which is followed by a concluding chapter. Chapter 1 describes the models and formalisms applicable to syntax-based SMT. The first section describes the phrasal translation units in phrase-based SMT, its limitations, and how tree structures address the limitations of the phrase-based approach. This explanation is useful as translation units are the key difference between the phrasebased and syntax-based SMT approaches. The next two sections describe the grammar formalisms and the statistical models that define syntax-based SMT. The section that covers the grammar formalisms (i.e., synchronous context-free grammar [SCFG] and synchronous tree-substitution grammar [STSG]), would have been clearer if their differences were presented in a side-by-side illustrating example. The remainder of the chapter discusses different categories of syntax-based SMT approaches and the history of these approaches, which include string-to-string, string-to-tree, tree-to-string, and
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Dependency Edge-based Transfer Model for Statistical Machine Translation
Previous models in syntax-based statistical machine translation usually resort to some kinds of synchronous procedures, few of these works are based on the analysis-transfer-generation methodology. In this paper, we present a statistical implementation of the analysis-transfergeneration methodology in rule-based translation. The procedures of syntax analysis, syntax transfer and language genera...
متن کاملA New Subtree-Transfer Approach to Syntax-Based Reordering for Statistical Machine Translation
In this paper we address the problem of translating between languages with word order disparity. The idea of augmenting statistical machine translation (SMT) by using a syntax-based reordering step prior to translation, proposed in recent years, has been quite successful in improving translation quality. We present a new technique for extracting syntax-based reordering rules, which are derived ...
متن کاملEdinburgh's Syntax-Based Machine Translation Systems
We present the syntax-based string-totree statistical machine translation systems built for the WMT 2013 shared translation task. Systems were developed for four language pairs. We report on adapting parameters, targeted reduction of the tuning set, and post-evaluation experiments on rule binarization and preventing dropping of verbs.
متن کاملDo we need phrases? Challenging the conventional wisdom in Statistical Machine Translation
We begin by exploring theoretical and practical issues with phrasal SMT, several of which are addressed by syntax-based SMT. Next, to address problems not handled by syntax, we propose the concept of a Minimal Translation Unit (MTU) and develop MTU sequence models. Finally we incorporate these models into a syntax-based SMT system and demonstrate that it improves on the state of the art transla...
متن کاملN-Gram-Based Statistical Machine Translation versus Syntax Augmented Machine Translation: Comparison and System Combination
In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual ...
متن کامل